-
Notifications
You must be signed in to change notification settings - Fork 11
More receiving yards parsing cases #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removed deprecated labels for elapsed time columns in cfbd_drives and added new columns to cfbd_live_plays documentation and tests. Updated test expectations to use expect_in instead of expect_setequal for column checks.
Added new columns to cfbd_play_stats_player output, improved sack player aggregation, handled NULL values, and updated documentation and examples to reflect changes. Also updated cfbd_live_plays documentation to include new columns for average start yard line and deserve to win metrics.
Changed cfbd_pbp_data to assign 3 timeouts per half for offense and defense when timeout data is missing from the API. Updated documentation and examples to reflect this behavior.
Added .groups = "drop" to the dplyr::summarise call in add_play_counts to control grouping behavior and prevent potential warnings in future dplyr versions.
Removed the specific count of variables from the return value description in both R and Rd files to improve maintainability and accuracy as the data frame structure may change.
… function usage in play data functions Corrected spacing and replaced superseded `dplyr::distinct_all()` with `dplyr::distinct()`, and standardized assignment spacing for improved code readability and consistency.
Added a check to filter out games with fewer than 20 plays in the play-by-play data processing. This helps avoid issues with EPA/WPA models and improves data validation.
Update package version to 2.1.0. Add release notes for bug fixes in `cfbd_pbp_data()` and improvements to `add_yardage()` handling missing yardage values. Update cran-comments to reflect minor release and changes.
Added normalization for 'seasonType' to 'season_type' in cfbd_stats_game_advanced. Updated tests to check for column inclusion with expect_in instead of expect_setequal, and added team ID columns to betting lines test.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds a new exported play-text preprocessing function Changes
Sequence Diagram(s)sequenceDiagram
participant cfbd_pbp_data as cfbd_pbp_data()
participant clean_fn as clean_play_text()
participant addyard as add_yardage()
participant penalty as penalty_detection()
participant epa as EPA/WPA_calc
cfbd_pbp_data->>clean_fn: pass play_df (per game)
clean_fn-->>cfbd_pbp_data: return play_df with cleaned_text
cfbd_pbp_data->>addyard: pass play_df with cleaned_text
addyard-->>cfbd_pbp_data: return play_df with yds_rushed/yds_receiving
cfbd_pbp_data->>penalty: pass augmented play_df
penalty-->>cfbd_pbp_data: return play_df post-penalty handling
cfbd_pbp_data->>epa: compute EPA/WPA using processed play_df
epa-->>cfbd_pbp_data: EPA/WPA results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
R/helper_pbp_add_yardage.R (1)
44-56: Document the dependency oncleaned_textcolumn.The function now relies on a
cleaned_textcolumn (used starting at line 58) but doesn't validate its presence or document this requirement. Consider either:
- Adding parameter documentation that
play_dfmust contain acleaned_textcolumn- Adding a validation check at the start of the function
- Creating the column if it doesn't exist by calling
clean_play_text()🔎 Example validation check
add_yardage <- function(play_df) { + if (!"cleaned_text" %in% names(play_df)) { + stop("play_df must contain a 'cleaned_text' column. Call clean_play_text() first.") + } + play_df$yds_rushed <- NA_real_ play_df$yds_receiving <- NA_real_
🧹 Nitpick comments (1)
R/cfbd_pbp_data.R (1)
2168-2179: Consider refactoring for efficiency and completeness.The function could be improved in several ways:
- Efficiency: Multiple
str_replacecalls that reassign to the same column are less efficient than chaining operations or usingstr_replace_allwith multiple patterns.- Whitespace handling: Consider adding
str_trim()at the end to remove leading/trailing whitespace that may result from the replacements.- Pattern consolidation: Some patterns could potentially be combined (e.g., the two "No Huddle" patterns on lines 2174-2175).
💡 Example refactored implementation
clean_play_text <- function(play_df) { play_df <- play_df %>% dplyr::mutate( cleaned_text = .data$play_text %>% stringr::str_replace("^\\(\\d{1,2}:\\d{2}\\)\\s+", "") %>% stringr::str_replace("\\s\\b(short|deep)\\b\\s", " ") %>% stringr::str_replace("\\s\\b(left|middle|right)\\b\\s", " ") %>% stringr::str_replace("\\s*No Huddle-Shotgun\\s+", "") %>% stringr::str_replace("No Huddle-?", "") %>% stringr::str_replace("\\s*Shotgun\\s+", "") %>% stringr::str_replace("\\s+", " ") %>% stringr::str_trim() ) }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
DESCRIPTIONNAMESPACER/cfbd_pbp_data.RR/helper_pbp_add_yardage.RcfbfastR.Rprojman/helpers_pbp.Rdtests/testthat/test-cfbd_pbp_data.R
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-25T01:24:22.513Z
Learnt from: saiemgilani
Repo: sportsdataverse/cfbfastR PR: 113
File: R/cfbd_stats.R:381-381
Timestamp: 2025-08-25T01:24:22.513Z
Learning: In cfbfastR's cfbd_stats_season_advanced() function, the gsub("Opportunies", "_opportunities", colnames(df)) is intentional and correct for handling the actual API response column naming from CollegeFootballData.com, not a typo as it might appear.
Applied to files:
R/helper_pbp_add_yardage.RR/cfbd_pbp_data.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ubuntu-latest (release)
- GitHub Check: windows-latest (release)
- GitHub Check: ubuntu-latest (oldrel-1)
🔇 Additional comments (6)
DESCRIPTION (1)
61-61: LGTM! Appropriate dependency addition.Adding
patrickfor parameterized testing aligns with the test updates mentioned in the PR summary.NAMESPACE (1)
70-70: LGTM! Appropriate NAMESPACE updates.The new export and import declarations correctly support the
clean_play_text()function.Also applies to: 152-152
man/helpers_pbp.Rd (1)
11-11: LGTM! Complete documentation for the new function.The documentation properly describes
clean_play_text()including its purpose, usage, and return values. The note about ESPN PBP changes in 2025 provides helpful context.Also applies to: 26-26, 181-184, 342-345
R/cfbd_pbp_data.R (1)
598-598: LGTM! Appropriate pipeline placement.Calling
clean_play_text()beforepenalty_detection()ensures the text is cleaned early in the processing pipeline.R/helper_pbp_add_yardage.R (1)
157-238: Inconsistent use ofcleaned_textvsplay_textacross yardage extractions should be clarified.The
yds_rushedandyds_receivingcalculations usecleaned_text(lines 58–150), whileyds_int_return,yds_kickoff,yds_punted,yds_fumble_return, andyds_sackedstill useplay_text(lines 157–238).The
cleaned_textfunction removes pass depth/direction qualifiers (short/deep/left/middle/right), clock timestamps, and huddle/shotgun markers. These removals likely benefit rush and receiving patterns but may not affect special teams or interception patterns, which use different regex anchors. However, this design decision lacks documentation. Either:
- Add a comment explaining why only rush/receiving need
cleaned_textand others don't- Apply
cleaned_textconsistently across all yardage extractions for uniformity- Document in the function's roxygen comments which yardage types require text cleaning
tests/testthat/test-cfbd_pbp_data.R (1)
57-59: No action needed. The test filtering mechanism will work correctly.The
cfbd_pbp_data()function preserves the originalplay_textcolumn. Theclean_play_text()function creates a separatecleaned_textcolumn viadplyr::mutate()without modifyingplay_text. The exact string matching on line 57 using the full timestamp strings like"(14:46) Shotgun #10 H.King..."will find the intended plays as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
R/helper_pbp_add_yardage.Rtests/testthat/test-cfbd_pbp_data.Rtests/testthat/test-cfbd_play_stats_types.Rtests/testthat/test-cfbd_play_types.R
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/testthat/test-cfbd_pbp_data.R
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-25T01:24:22.513Z
Learnt from: saiemgilani
Repo: sportsdataverse/cfbfastR PR: 113
File: R/cfbd_stats.R:381-381
Timestamp: 2025-08-25T01:24:22.513Z
Learning: In cfbfastR's cfbd_stats_season_advanced() function, the gsub("Opportunies", "_opportunities", colnames(df)) is intentional and correct for handling the actual API response column naming from CollegeFootballData.com, not a typo as it might appear.
Applied to files:
R/helper_pbp_add_yardage.R
📚 Learning: 2025-08-25T01:47:09.915Z
Learnt from: saiemgilani
Repo: sportsdataverse/cfbfastR PR: 113
File: R/cfbd_play.R:731-733
Timestamp: 2025-08-25T01:47:09.915Z
Learning: In the cfbfastR codebase, the pattern `tibble::tibble(data = .data$.)` in R/cfbd_play.R is valid and should not be flagged as incorrect usage of the .data pronoun.
Applied to files:
tests/testthat/test-cfbd_play_stats_types.Rtests/testthat/test-cfbd_play_types.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: windows-latest (release)
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: ubuntu-latest (release)
🔇 Additional comments (5)
tests/testthat/test-cfbd_play_stats_types.R (1)
5-12: LGTM - Test expectation updated for new play stat type.The row count expectation reflects an additional play stat type from the API. Note that this test is coupled to external data and may require future updates if the API schema changes.
R/helper_pbp_add_yardage.R (3)
55-58: Good defensive initialization of cleaned_text.This fallback ensures the function works correctly whether or not the caller has pre-processed the data with
clean_play_text().
62-113: Previous regex alternation issue resolved by restructuring.The lookbehind patterns have been correctly split into separate branches for "run" and "rush" variants, eliminating the parsing ambiguity flagged in the previous review. Each pattern is now unambiguous (e.g.,
"(?<= run for a loss of)"vs"(?<= rush for a loss of)").
158-167: Verify regex pattern combination for PASSER pass edge case.Lines 163-167 check for both
"pass$"(end of string) AND"^to "(start of string) in the samecleaned_text. This seems contradictory unlessclean_play_text()transforms the text in a way that makes both patterns matchable.Could you verify this logic works as intended with sample 2025 play data? If
cleaned_textcannot simultaneously end with "pass" and start with "to ", this branch will never match.tests/testthat/test-cfbd_play_types.R (1)
5-12: LGTM - Test expectation updated for new play type.The row count expectation reflects an additional play type from the API. Same consideration as the play stats types test regarding external data coupling.
Changed the default value of season_type to 'both' in cfbd_game_info and its documentation. Updated week validation to allow weeks 1-16 instead of 1-15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
R/utils.R (1)
253-262: Error message doesn't match the updated validation range.The
range_checknow validates weeks 1-16, but the error message on line 260 still references "1-15". This will confuse users who enter an invalid week value.🔧 Proposed fix
if(!all(checks)){ - cli::cli_abort(glue::glue("Enter valid {deparse(substitute(week))} 1-15\n(14 for seasons pre-playoff, i.e. 2014 or earlier)")) + cli::cli_abort(glue::glue("Enter valid {deparse(substitute(week))} 1-16\n(14 for seasons pre-playoff, i.e. 2014 or earlier)")) }
🧹 Nitpick comments (2)
R/cfbd_play.R (1)
113-114:cfbd_plays()still defaults to"regular"whilecfbd_play_stats_player()defaults to"both".For consistency with the changes made to
cfbd_play_stats_player()(line 282) and other functions mentioned in NEWS.md, consider updatingcfbd_plays()to also default to"both". The documentation on line 59 would also need updating if this change is made.♻️ Suggested change for consistency
Line 59 (documentation):
-#' @param season_type (*String* default regular): Season type - regular, postseason, both, allstar, spring_regular, spring_postseason +#' @param season_type (*String* default both): Season type - regular, postseason, both, allstar, spring_regular, spring_postseasonLine 114 (function signature):
cfbd_plays <- function(year = 2020, - season_type = "regular", + season_type = "both", week = 1,R/cfbd_games.R (1)
258-262: Consider aligningseason_typedefaults across related functions.For awareness:
cfbd_game_info()andcfbd_game_media()now default to"both", whilecfbd_game_weather(),cfbd_game_player_stats(), andcfbd_game_team_stats()still default to"regular". If this inconsistency is intentional for this PR scope, no action needed—but you may want to align these in a follow-up for API consistency.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
DESCRIPTIONNEWS.mdR/cfbd_games.RR/cfbd_play.RR/utils.Rcran-comments.mdman/cfbd_game_info.Rdman/cfbd_play_stats_player.Rd
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-25T01:45:22.587Z
Learnt from: saiemgilani
Repo: sportsdataverse/cfbfastR PR: 113
File: R/cfbd_metrics.R:330-334
Timestamp: 2025-08-25T01:45:22.587Z
Learning: In cfbfastR, apparent inconsistencies in position validation lists between different CFBD API endpoints (like "C" vs "OC") are often intentional due to different data providers backing different endpoints. Don't flag these as issues to fix unless there's clear evidence of a functional problem.
Applied to files:
NEWS.mdcran-comments.md
📚 Learning: 2025-08-25T01:24:22.513Z
Learnt from: saiemgilani
Repo: sportsdataverse/cfbfastR PR: 113
File: R/cfbd_stats.R:381-381
Timestamp: 2025-08-25T01:24:22.513Z
Learning: In cfbfastR's cfbd_stats_season_advanced() function, the gsub("Opportunies", "_opportunities", colnames(df)) is intentional and correct for handling the actual API response column naming from CollegeFootballData.com, not a typo as it might appear.
Applied to files:
NEWS.mdman/cfbd_play_stats_player.RdR/cfbd_play.RDESCRIPTIONR/cfbd_games.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ubuntu-latest (oldrel-1)
- GitHub Check: windows-latest (release)
- GitHub Check: ubuntu-latest (release)
🔇 Additional comments (7)
DESCRIPTION (1)
3-3: LGTM!Version bump to 2.2.0 is appropriate for the new features and bug fixes. Adding
patrickto Suggests is a good choice for parameterized testing support.Also applies to: 61-61
cran-comments.md (1)
7-8: LGTM!Release notes accurately summarize the
validate_week()bug fix and documentation updates for the CRAN submission.NEWS.md (1)
1-5: Release notes look good, but verifycfbd_plays()consistency.The NEWS entry documents
season_typedefault changes forcfbd_game_info()andcfbd_play_stats_player(). However,cfbd_plays()inR/cfbd_play.Rstill defaults to"regular"(line 114), whilecfbd_play_stats_player()now defaults to"both".Is this intentional, or should
cfbd_plays()also be updated for consistency?R/cfbd_play.R (1)
192-192: LGTM!The
cfbd_play_stats_player()function has been consistently updated with both documentation (line 192) and default parameter (line 282) changed to"both".Also applies to: 282-282
man/cfbd_game_info.Rd (1)
10-10: LGTM!The
season_typedefault is updated to"both"consistently in both the usage signature and argument description. This aligns with the source change inR/cfbd_games.Rline 132.Also applies to: 25-25
R/cfbd_games.R (1)
70-70: LGTM!The roxygen documentation (line 70) and function signature (line 132) are consistent in updating the
season_typedefault to"both"forcfbd_game_info().Also applies to: 132-132
man/cfbd_play_stats_player.Rd (1)
14-14: Documentation is consistent.The
season_typedefault of"both"is correctly applied in both the function signature (line 14) and the argument description (line 33). The sourceR/cfbd_play.Rfile has the matching roxygen2 documentation and function definition.
Summary by CodeRabbit
New Features
Bug Fixes
Behavior Changes
Tests / Chores
✏️ Tip: You can customize this high-level summary in your review settings.